Bootstrapping information extraction mappings by similarity-based reuse of taxonomies

نویسندگان

  • Martijn Spitters
  • Remko Bonnema
  • Mihai Rotaru
  • Jakub Zavrel
چکیده

Many practical information extraction systems use simple taxonomies for mapping extracted strings to client-specific concept codes. In such taxonomies, concepts are defined as groups of semantically similar words and phrases. For the mapping to be accurate, a new client-specific taxonomy – usually nothing more than a set of concept codes, each with a single description – needs to be enriched with the domain-specific terminology variations, which is a very labor-intensive task. In this paper, we describe a method to significantly reduce the required manual effort for this task. Our approach is based on combining multiple existing client-specific taxonomies into a single semantic space. On a set of gold standard taxonomies our method achieves an average precision of 91% and a recall of 55%. An additional practice test shows that the method saves at least 62% of the manual effort needed to enrich a new taxonomy.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Découverte automatique de correspondances entre ontologies

In this thesis, we investigate a principled approach for defining and discovering probabilistic inclusion mappings between two taxonomies, with a clear semantic, in a purpose of collaborative exchange of documents. Firstly, we compare two ways of modeling probabilistic mappings which are compatible with the logical constraints declared in each taxonomy according to a monotony property, then we ...

متن کامل

BootOX: Bootstrapping OWL 2 Ontologies and R2RML Mappings from Relational Databases

In this demo paper we present BOOTOX, a system facilitating ontology and mapping development by their automatic extraction (i.e., bootstrapping) from relational databases. BOOTOX has a number of advantages: it allows to control the OWL 2 profile of the output ontologies, and to bootstrap complex and provenance mappings, which are beyond the W3C direct mapping specification. Moreover, BOOTOX all...

متن کامل

A Comparison Of Efficacy And Assumptions Of Bootstrapping Algorithms For Training Information Extraction Systems

Information Extraction systems offer a way of automating the discovery of information from text documents. Research and commercial systems use considerable training data to learn dictionaries and patterns to use for extraction. Learning to extract useful information from text data using only minutes of user time means that we need to leverage unlabeled data to accompany the small amount of labe...

متن کامل

Rapid Induction of Multiple Taxonomies for Enhanced Faceted Text Browsing

In this paper we present and compare two methodologies for rapidly inducing multiple subject-specific taxonomies from crawled data. The first method involves a sentence-level words co-occurrence frequency method for building the taxonomy, while the second involves the bootstrapping of a Word2Vec based algorithm with a directed crawler. We exploit the multilingual open-content directory of the W...

متن کامل

A Unified Approach for Aligning Taxonomies and Debugging Taxonomies and Their Alignments

With the increased use of ontologies in semantically-enabled applications, the issues of debugging and aligning ontologies have become increasingly important. The quality of the results of such applications is directly dependent on the quality of the ontologies and mappings between the ontologies they employ. A key step towards achieving high quality ontologies and mappings is discovering and r...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010